Search CORE

25 research outputs found

Hack Weeks as a model for Data Science Education and Collaboration

Author: Arendt Anthony
Hogg David W.
Huppenkothen Daniela
Ram Karthik
Rokem Ariel
VanderPlas Jake
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 31/10/2017
Field of study

Across almost all scientific disciplines, the instruments that record our experimental data and the methods required for storage and data analysis are rapidly increasing in complexity. This gives rise to the need for scientific communities to adapt on shorter time scales than traditional university curricula allow for, and therefore requires new modes of knowledge transfer. The universal applicability of data science tools to a broad range of problems has generated new opportunities to foster exchange of ideas and computational workflows across disciplines. In recent years, hack weeks have emerged as an effective tool for fostering these exchanges by providing training in modern data analysis workflows. While there are variations in hack week implementation, all events consist of a common core of three components: tutorials in state-of-the-art methodology, peer-learning and project work in a collaborative environment. In this paper, we present the concept of a hack week in the larger context of scientific meetings and point out similarities and differences to traditional conferences. We motivate the need for such an event and present in detail its strengths and challenges. We find that hack weeks are successful at cultivating collaboration and the exchange of knowledge. Participants self-report that these events help them both in their day-to-day research as well as their careers. Based on our results, we conclude that hack weeks present an effective, easy-to-implement, fairly low-cost tool to positively impact data analysis literacy in academic disciplines, foster collaboration and cultivate best practices.Comment: 15 pages, 2 figures, submitted to PNAS, all relevant code available at https://github.com/uwescience/HackWeek-Writeu

arXiv.org e-Print Archive

MPG.PuRe

Classification of Stellar Spectra with LLE

Author: Abazajian
Andrew Connolly
Beauchemin
de Ridder
Deeming
Jake Vanderplas
Jeff Schneider
Liang Xiong
McGurk
Scott F. Daniel
Singh
Vanderplas
Whitney
Whitney
Yip
Yip
Publication venue: 'IOP Publishing'
Publication date: 20/10/2011
Field of study

We investigate the use of dimensionality reduction techniques for the classification of stellar spectra selected from the SDSS. Using local linear embedding (LLE), a technique that preserves the local (and possibly non-linear) structure within high dimensional data sets, we show that the majority of stellar spectra can be represented as a one dimensional sequence within a three dimensional space. The position along this sequence is highly correlated with spectral temperature. Deviations from this "stellar locus" are indicative of spectra with strong emission lines (including misclassified galaxies) or broad absorption lines (e.g. Carbon stars). Based on this analysis, we propose a hierarchical classification scheme using LLE that progressively identifies and classifies stellar spectra in a manner that requires no feature extraction and that can reproduce the classic MK classifications to an accuracy of one type.Comment: 15 pages, 13 figures; accepted for publication in The Astronomical Journa

arXiv.org e-Print Archive

Crossref

SNANA: A Public Software Package for Supernova Analysis

Author: Bernstein Joseph P.
Cinabro David
Dilday Benjamin
Frieman Joshua A.
Jha Saurabh
Kessler Richard
Kuhlmann Stephen
Miknaitis Gajus
Sako Masao
Taylor Matt
Vanderplas Jake
Publication venue: 'University of Chicago Press'
Publication date: 28/08/2009
Field of study

We describe a general analysis package for supernova (SN) light curves, called SNANA, that contains a simulation, light curve fitter, and cosmology fitter. The software is designed with the primary goal of using SNe Ia as distance indicators for the determination of cosmological parameters, but it can also be used to study efficiencies for analyses of SN rates, estimate contamination from non-Ia SNe, and optimize future surveys. Several SN models are available within the same software architecture, allowing technical features such as K-corrections to be consistently used among multiple models, and thus making it easier to make detailed comparisons between models. New and improved light-curve models can be easily added. The software works with arbitrary surveys and telescopes and has already been used by several collaborations, leading to more robust and easy-to-use code. This software is not intended as a final product release, but rather it is designed to undergo continual improvements from the community as more is learned about SNe. Below we give an overview of the SNANA capabilities, as well as some of its limitations. Interested users can find software downloads and more detailed information from the manuals at http://www.sdss.org/supernova/SNANA.html .Comment: Accepted for publication in PAS

arXiv.org e-Print Archive

Crossref

Tests of Modified Gravity with Dwarf Galaxies

Author: A.C. Davis
B. Jain
Bhuvnesh Jain
C.M. Will
J.F. Navarro
Jake VanderPlas
L. Lombriser
P. Chang
R. Swaters
Publication venue: 'IOP Publishing'
Publication date: 31/05/2011
Field of study

In modified gravity theories that seek to explain cosmic acceleration, dwarf galaxies in low density environments can be subject to enhanced forces. The class of scalar-tensor theories, which includes f(R) gravity, predict such a force enhancement (massive galaxies like the Milky Way can evade it through a screening mechanism that protects the interior of the galaxy from this "fifth" force). We study observable deviations from GR in the disks of late-type dwarf galaxies moving under gravity. The fifth-force acts on the dark matter and HI gas disk, but not on the stellar disk owing to the self-screening of main sequence stars. We find four distinct observable effects in such disk galaxies: 1. A displacement of the stellar disk from the HI disk. 2. Warping of the stellar disk along the direction of the external force. 3. Enhancement of the rotation curve measured from the HI gas compared to that of the stellar disk. 4. Asymmetry in the rotation curve of the stellar disk. We estimate that the spatial effects can be up to 1 kpc and the rotation velocity effects about 10 km/s in infalling dwarf galaxies. Such deviations are measurable: we expect that with a careful analysis of a sample of nearby dwarf galaxies one can improve astrophysical constraints on gravity theories by over three orders of magnitude, and even solar system constraints by one order of magnitude. Thus effective tests of gravity along the lines suggested by Hui et al (2009) and Jain (2011) can be carried out with low-redshift galaxies, though care must be exercised in understanding possible complications from astrophysical effects.Comment: 26 pages, 9 figure

arXiv.org e-Print Archive

Crossref

API design for machine learning software: experiences from the scikit-learn project

Author: Blondel Mathieu
Buitinck Lars
Gramfort Alexandre
Grisel Olivier
Grobler Jaques
Holt Brian
Joly Arnaud
Layton Robert
Louppe Gilles
Mueller Andreas
Niculae Vlad
Pedregosa Fabian
Prettenhofer Peter
Vanderplas Jake
Varoquaux Gaël
Publication venue
Publication date: 01/09/2013
Field of study

Scikit-learn is an increasingly popular machine learning li- brary. Written in Python, it is designed to be simple and efficient, accessible to non-experts, and reusable in various contexts. In this paper, we present and discuss our design choices for the application programming interface (API) of the project. In particular, we describe the simple and elegant interface shared by all learning and processing units in the library and then discuss its advantages in terms of composition and reusability. The paper also comments on implementation details specific to the Python ecosystem and analyzes obstacles faced by users and developers of the library

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Federation ResearchOnline

HAL-CEA

Scikit-learn: Machine Learning in Python

Author: Blondel Mathieu
Brucher Matthieu
Cournapeau David
Dubourg Vincent
Duchesnay Edouard
Gramfort Alexandre
Grisel Olivier
Michel Vincent
Passos Alexandre
Pedregosa Fabian
Perrot Matthieu
Prettenhofer Peter
Thirion Bertrand
Vanderplas Jake
Varoquaux Gaël
Weiss Ron
Publication venue: Microtome Publishing
Publication date: 12/10/2011
Field of study

International audienceScikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net

HAL Clermont Université

INRIA a CCSD electronic archive server

HAL-CEA